Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

Targeted Gene Metagenomic Data Analysis ◾ 283

perform quality filtering and chimera removal. You may not need to perform any quality

control prior using any of these two methods. Only for deblur denoising, you may need to

merge paired-end reads as we did for the clustering.

7.3.4.2.2.1 Denoising with DADA2

The “q2-dada2” plugin is used to denoise single-end and paired-end reads (no need for

read merging). Use “denoise-single” method for the single-end reads and “denoise-paired”

method for the paired-end reads. You can always use “--help” with any of the methods to

display the usage and options. DADA2 methods denoise sequences, dereplicate them, and

filter chimeras.

qiime dada2 denoise-single --help

qiime dada2 denoise-paired --help

Now, we can use “q2-dada2” to denoise the yoga data that we imported and saved as

“demux-yoga.qza”. We will use “denoise-paired” method. To keep the files organized, we

will create the “dada2” subdirectory for DADA2 denoising files.

mkdir dada2

qiime dada2 denoise-paired \

--i-demultiplexed-seqs inputs/demux-yoga.qza \

--p-trim-left-f 0 \

--p-trim-left-r 0 \

--p-trunc-len-f 250 \

--p-trunc-len-r 250 \

--p-n-threads 4 \

--o-representative-sequences dada2/rep-seqs_yoga_dada2.qza \

--o-table dada2/table_yoga_dada2.qza \

--o-denoising-stats dada2/stats_yoga_dada2.qza

The parameters “--p-trim-left-f”, “--p-trim-left-r”, “--p-trunc-len-f”, and “--p-trunc-len-r”

are optional, and they are used to trim and truncate the forward and reverse sequences,

respectively, to improve the quality of the reads if required. If you use “--p-trunc-len-f 0”

and “--p-trunc-len-r 0”, the truncation will be disabled. The parameter “--p-n-threads”

specifies the number of threads used for denoising. If the “denoise-single” method is used

with paired-end reads instead of “denoise-paired”, only forward reads will be used as input

while the reverse reads will be ignored.

We set “--p-trunc-len-f 250” and “--p-trunc-len-r 250” to truncate the forward and

reverse reads to 250 bases. However, we did not trim the left ends of the reads because they

do not need trimming. Like clustering, DADA2 feature table and representative sequences

artifacts are used for the downstream analysis for phylogeny, diversity analysis, taxonomy

assignment, etc. Moreover, the DADA2 stats summary artifact contains useful information

regarding the filtering and denoising. Users can use “q2-metadata tabulate” to generate a

visualization file that can be displayed on the Internet browser with “tools view” as follows: